INTRODUCTION
This project is about the analysis of FBI Gun-check, The Data sources are from FBI's National Instant Criminal Background Check System. The U.S. census data is found in a .csv file that contains several variables at the state level, and the NICS data is found in one sheet of an .xlsx file that contains the number of firearm checks by month, state, and type. we are going to explore the datasets to answer and get insight to the following Question;
what is the correlation between state and the gun purchasing order?. what is the overall trending of gun purchase?. what is state has the highest growth in Gun registration?.
# packages installation and loading of dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
# Load your data and print out a few lines. Perform operations to inspect data
# types and look for instances of missing or possibly errant data.
df_gun = pd.read_excel(r'C:\Users\k2k\Desktop\udacity alx projects\project 1\gun_data.xlsx')
df_census = pd.read_csv(r'C:\Users\k2k\Desktop\udacity alx projects\project 1\U.S. Census Data.csv')
#let check the top 5 row of df_gun
df_gun.head()
| month | state | permit | permit_recheck | handgun | long_gun | other | multiple | admin | prepawn_handgun | ... | returned_other | rentals_handgun | rentals_long_gun | private_sale_handgun | private_sale_long_gun | private_sale_other | return_to_seller_handgun | return_to_seller_long_gun | return_to_seller_other | totals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-09 | Alabama | 16717.0 | 0.0 | 5734.0 | 6320.0 | 221.0 | 317 | 0.0 | 15.0 | ... | 0.0 | 0.0 | 0.0 | 9.0 | 16.0 | 3.0 | 0.0 | 0.0 | 3.0 | 32019 |
| 1 | 2017-09 | Alaska | 209.0 | 2.0 | 2320.0 | 2930.0 | 219.0 | 160 | 0.0 | 5.0 | ... | 0.0 | 0.0 | 0.0 | 17.0 | 24.0 | 1.0 | 0.0 | 0.0 | 0.0 | 6303 |
| 2 | 2017-09 | Arizona | 5069.0 | 382.0 | 11063.0 | 7946.0 | 920.0 | 631 | 0.0 | 13.0 | ... | 0.0 | 0.0 | 0.0 | 38.0 | 12.0 | 2.0 | 0.0 | 0.0 | 0.0 | 28394 |
| 3 | 2017-09 | Arkansas | 2935.0 | 632.0 | 4347.0 | 6063.0 | 165.0 | 366 | 51.0 | 12.0 | ... | 0.0 | 0.0 | 0.0 | 13.0 | 23.0 | 0.0 | 0.0 | 2.0 | 1.0 | 17747 |
| 4 | 2017-09 | California | 57839.0 | 0.0 | 37165.0 | 24581.0 | 2984.0 | 0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 123506 |
5 rows × 27 columns
#let check top 5 row of df_census
df_census.head()
| Fact | Fact Note | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | ... | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Population estimates, July 1, 2016, (V2016) | NaN | 4,863,300 | 741,894 | 6,931,071 | 2,988,248 | 39,250,017 | 5,540,545 | 3,576,452 | 952,065 | ... | 865454 | 6651194 | 27,862,596 | 3,051,217 | 624,594 | 8,411,808 | 7,288,000 | 1,831,102 | 5,778,708 | 585,501 |
| 1 | Population estimates base, April 1, 2010, (V2... | NaN | 4,780,131 | 710,249 | 6,392,301 | 2,916,025 | 37,254,522 | 5,029,324 | 3,574,114 | 897,936 | ... | 814195 | 6346298 | 25,146,100 | 2,763,888 | 625,741 | 8,001,041 | 6,724,545 | 1,853,011 | 5,687,289 | 563,767 |
| 2 | Population, percent change - April 1, 2010 (es... | NaN | 1.70% | 4.50% | 8.40% | 2.50% | 5.40% | 10.20% | 0.10% | 6.00% | ... | 0.063 | 0.048 | 10.80% | 10.40% | -0.20% | 5.10% | 8.40% | -1.20% | 1.60% | 3.90% |
| 3 | Population, Census, April 1, 2010 | NaN | 4,779,736 | 710,231 | 6,392,017 | 2,915,918 | 37,253,956 | 5,029,196 | 3,574,097 | 897,934 | ... | 814180 | 6346105 | 25,145,561 | 2,763,885 | 625,741 | 8,001,024 | 6,724,540 | 1,852,994 | 5,686,986 | 563,626 |
| 4 | Persons under 5 years, percent, July 1, 2016, ... | NaN | 6.00% | 7.30% | 6.30% | 6.40% | 6.30% | 6.10% | 5.20% | 5.80% | ... | 0.071 | 0.061 | 7.20% | 8.30% | 4.90% | 6.10% | 6.20% | 5.50% | 5.80% | 6.50% |
5 rows × 52 columns
#checking for duplicate in df_gun
df_gun.duplicated().sum()
0
#checking form missing value
df_gun.dropna().head()
| month | state | permit | permit_recheck | handgun | long_gun | other | multiple | admin | prepawn_handgun | ... | returned_other | rentals_handgun | rentals_long_gun | private_sale_handgun | private_sale_long_gun | private_sale_other | return_to_seller_handgun | return_to_seller_long_gun | return_to_seller_other | totals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-09 | Alabama | 16717.0 | 0.0 | 5734.0 | 6320.0 | 221.0 | 317 | 0.0 | 15.0 | ... | 0.0 | 0.0 | 0.0 | 9.0 | 16.0 | 3.0 | 0.0 | 0.0 | 3.0 | 32019 |
| 1 | 2017-09 | Alaska | 209.0 | 2.0 | 2320.0 | 2930.0 | 219.0 | 160 | 0.0 | 5.0 | ... | 0.0 | 0.0 | 0.0 | 17.0 | 24.0 | 1.0 | 0.0 | 0.0 | 0.0 | 6303 |
| 2 | 2017-09 | Arizona | 5069.0 | 382.0 | 11063.0 | 7946.0 | 920.0 | 631 | 0.0 | 13.0 | ... | 0.0 | 0.0 | 0.0 | 38.0 | 12.0 | 2.0 | 0.0 | 0.0 | 0.0 | 28394 |
| 3 | 2017-09 | Arkansas | 2935.0 | 632.0 | 4347.0 | 6063.0 | 165.0 | 366 | 51.0 | 12.0 | ... | 0.0 | 0.0 | 0.0 | 13.0 | 23.0 | 0.0 | 0.0 | 2.0 | 1.0 | 17747 |
| 4 | 2017-09 | California | 57839.0 | 0.0 | 37165.0 | 24581.0 | 2984.0 | 0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 123506 |
5 rows × 27 columns
#assigning correct data types
df_gun['month']= pd.to_datetime(df_gun['month'])
df_census.head()
| Fact | Fact Note | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | ... | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Population estimates, July 1, 2016, (V2016) | NaN | 4,863,300 | 741,894 | 6,931,071 | 2,988,248 | 39,250,017 | 5,540,545 | 3,576,452 | 952,065 | ... | 865454 | 6651194 | 27,862,596 | 3,051,217 | 624,594 | 8,411,808 | 7,288,000 | 1,831,102 | 5,778,708 | 585,501 |
| 1 | Population estimates base, April 1, 2010, (V2... | NaN | 4,780,131 | 710,249 | 6,392,301 | 2,916,025 | 37,254,522 | 5,029,324 | 3,574,114 | 897,936 | ... | 814195 | 6346298 | 25,146,100 | 2,763,888 | 625,741 | 8,001,041 | 6,724,545 | 1,853,011 | 5,687,289 | 563,767 |
| 2 | Population, percent change - April 1, 2010 (es... | NaN | 1.70% | 4.50% | 8.40% | 2.50% | 5.40% | 10.20% | 0.10% | 6.00% | ... | 0.063 | 0.048 | 10.80% | 10.40% | -0.20% | 5.10% | 8.40% | -1.20% | 1.60% | 3.90% |
| 3 | Population, Census, April 1, 2010 | NaN | 4,779,736 | 710,231 | 6,392,017 | 2,915,918 | 37,253,956 | 5,029,196 | 3,574,097 | 897,934 | ... | 814180 | 6346105 | 25,145,561 | 2,763,885 | 625,741 | 8,001,024 | 6,724,540 | 1,852,994 | 5,686,986 | 563,626 |
| 4 | Persons under 5 years, percent, July 1, 2016, ... | NaN | 6.00% | 7.30% | 6.30% | 6.40% | 6.30% | 6.10% | 5.20% | 5.80% | ... | 0.071 | 0.061 | 7.20% | 8.30% | 4.90% | 6.10% | 6.20% | 5.50% | 5.80% | 6.50% |
5 rows × 52 columns
df_census = df_census[:64]
df_census
| Fact | Fact Note | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | ... | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Population estimates, July 1, 2016, (V2016) | NaN | 4,863,300 | 741,894 | 6,931,071 | 2,988,248 | 39,250,017 | 5,540,545 | 3,576,452 | 952,065 | ... | 865454 | 6651194 | 27,862,596 | 3,051,217 | 624,594 | 8,411,808 | 7,288,000 | 1,831,102 | 5,778,708 | 585,501 |
| 1 | Population estimates base, April 1, 2010, (V2... | NaN | 4,780,131 | 710,249 | 6,392,301 | 2,916,025 | 37,254,522 | 5,029,324 | 3,574,114 | 897,936 | ... | 814195 | 6346298 | 25,146,100 | 2,763,888 | 625,741 | 8,001,041 | 6,724,545 | 1,853,011 | 5,687,289 | 563,767 |
| 2 | Population, percent change - April 1, 2010 (es... | NaN | 1.70% | 4.50% | 8.40% | 2.50% | 5.40% | 10.20% | 0.10% | 6.00% | ... | 0.063 | 0.048 | 10.80% | 10.40% | -0.20% | 5.10% | 8.40% | -1.20% | 1.60% | 3.90% |
| 3 | Population, Census, April 1, 2010 | NaN | 4,779,736 | 710,231 | 6,392,017 | 2,915,918 | 37,253,956 | 5,029,196 | 3,574,097 | 897,934 | ... | 814180 | 6346105 | 25,145,561 | 2,763,885 | 625,741 | 8,001,024 | 6,724,540 | 1,852,994 | 5,686,986 | 563,626 |
| 4 | Persons under 5 years, percent, July 1, 2016, ... | NaN | 6.00% | 7.30% | 6.30% | 6.40% | 6.30% | 6.10% | 5.20% | 5.80% | ... | 0.071 | 0.061 | 7.20% | 8.30% | 4.90% | 6.10% | 6.20% | 5.50% | 5.80% | 6.50% |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 59 | Nonminority-owned firms, 2012 | NaN | 272,651 | 51,147 | 344,981 | 189,029 | 1,819,107 | 442,365 | 259,614 | 54,782 | ... | 74228 | 434025 | 1,224,845 | 218,826 | 70,491 | 450,109 | 426,697 | 104,785 | 379,934 | 55,397 |
| 60 | Veteran-owned firms, 2012 | NaN | 41,943 | 7,953 | 46,780 | 25,915 | 252,377 | 51,722 | 31,056 | 7,206 | ... | 8604 | 59379 | 213,590 | 18,754 | 8,237 | 76,434 | 49,331 | 12,912 | 39,830 | 6,470 |
| 61 | Nonveteran-owned firms, 2012 | NaN | 316,984 | 56,091 | 427,582 | 192,988 | 3,176,341 | 469,524 | 281,182 | 60,318 | ... | 66219 | 469392 | 2,057,218 | 219,807 | 63,317 | 548,439 | 461,401 | 94,960 | 370,755 | 51,353 |
| 62 | Population per square mile, 2010 | NaN | 94.4 | 1.2 | 56.3 | 56 | 239.1 | 48.5 | 738.1 | 460.8 | ... | 10.7 | 153.9 | 96.3 | 33.6 | 67.9 | 202.6 | 101.2 | 77.1 | 105 | 5.8 |
| 63 | Land area in square miles, 2010 | NaN | 50,645.33 | 570,640.95 | 113,594.08 | 52,035.48 | 155,779.22 | 103,641.89 | 4,842.36 | 1,948.54 | ... | 75811 | 41234.9 | 261,231.71 | 82,169.62 | 9,216.66 | 39,490.09 | 66,455.52 | 24,038.21 | 54,157.80 | 97,093.14 |
64 rows × 52 columns
df_census.rename(columns=df_census.iloc[0]).drop(df_census.index[0])
df_census.head()
| Fact | Fact Note | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | ... | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Population estimates, July 1, 2016, (V2016) | NaN | 4,863,300 | 741,894 | 6,931,071 | 2,988,248 | 39,250,017 | 5,540,545 | 3,576,452 | 952,065 | ... | 865454 | 6651194 | 27,862,596 | 3,051,217 | 624,594 | 8,411,808 | 7,288,000 | 1,831,102 | 5,778,708 | 585,501 |
| 1 | Population estimates base, April 1, 2010, (V2... | NaN | 4,780,131 | 710,249 | 6,392,301 | 2,916,025 | 37,254,522 | 5,029,324 | 3,574,114 | 897,936 | ... | 814195 | 6346298 | 25,146,100 | 2,763,888 | 625,741 | 8,001,041 | 6,724,545 | 1,853,011 | 5,687,289 | 563,767 |
| 2 | Population, percent change - April 1, 2010 (es... | NaN | 1.70% | 4.50% | 8.40% | 2.50% | 5.40% | 10.20% | 0.10% | 6.00% | ... | 0.063 | 0.048 | 10.80% | 10.40% | -0.20% | 5.10% | 8.40% | -1.20% | 1.60% | 3.90% |
| 3 | Population, Census, April 1, 2010 | NaN | 4,779,736 | 710,231 | 6,392,017 | 2,915,918 | 37,253,956 | 5,029,196 | 3,574,097 | 897,934 | ... | 814180 | 6346105 | 25,145,561 | 2,763,885 | 625,741 | 8,001,024 | 6,724,540 | 1,852,994 | 5,686,986 | 563,626 |
| 4 | Persons under 5 years, percent, July 1, 2016, ... | NaN | 6.00% | 7.30% | 6.30% | 6.40% | 6.30% | 6.10% | 5.20% | 5.80% | ... | 0.071 | 0.061 | 7.20% | 8.30% | 4.90% | 6.10% | 6.20% | 5.50% | 5.80% | 6.50% |
5 rows × 52 columns
#transposing df_census
df_census.set_index('Fact', inplace=True)
df_census = df_census.transpose().reset_index()
df_census.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 51 entries, 0 to 50 Data columns (total 65 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 51 non-null object 1 Population estimates, July 1, 2016, (V2016) 50 non-null object 2 Population estimates base, April 1, 2010, (V2016) 50 non-null object 3 Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016) 50 non-null object 4 Population, Census, April 1, 2010 50 non-null object 5 Persons under 5 years, percent, July 1, 2016, (V2016) 50 non-null object 6 Persons under 5 years, percent, April 1, 2010 50 non-null object 7 Persons under 18 years, percent, July 1, 2016, (V2016) 50 non-null object 8 Persons under 18 years, percent, April 1, 2010 50 non-null object 9 Persons 65 years and over, percent, July 1, 2016, (V2016) 50 non-null object 10 Persons 65 years and over, percent, April 1, 2010 50 non-null object 11 Female persons, percent, July 1, 2016, (V2016) 50 non-null object 12 Female persons, percent, April 1, 2010 50 non-null object 13 White alone, percent, July 1, 2016, (V2016) 51 non-null object 14 Black or African American alone, percent, July 1, 2016, (V2016) 51 non-null object 15 American Indian and Alaska Native alone, percent, July 1, 2016, (V2016) 51 non-null object 16 Asian alone, percent, July 1, 2016, (V2016) 51 non-null object 17 Native Hawaiian and Other Pacific Islander alone, percent, July 1, 2016, (V2016) 51 non-null object 18 Two or More Races, percent, July 1, 2016, (V2016) 50 non-null object 19 Hispanic or Latino, percent, July 1, 2016, (V2016) 51 non-null object 20 White alone, not Hispanic or Latino, percent, July 1, 2016, (V2016) 50 non-null object 21 Veterans, 2011-2015 50 non-null object 22 Foreign born persons, percent, 2011-2015 50 non-null object 23 Housing units, July 1, 2016, (V2016) 50 non-null object 24 Housing units, April 1, 2010 50 non-null object 25 Owner-occupied housing unit rate, 2011-2015 50 non-null object 26 Median value of owner-occupied housing units, 2011-2015 50 non-null object 27 Median selected monthly owner costs -with a mortgage, 2011-2015 50 non-null object 28 Median selected monthly owner costs -without a mortgage, 2011-2015 50 non-null object 29 Median gross rent, 2011-2015 50 non-null object 30 Building permits, 2016 50 non-null object 31 Households, 2011-2015 50 non-null object 32 Persons per household, 2011-2015 50 non-null object 33 Living in same house 1 year ago, percent of persons age 1 year+, 2011-2015 50 non-null object 34 Language other than English spoken at home, percent of persons age 5 years+, 2011-2015 50 non-null object 35 High school graduate or higher, percent of persons age 25 years+, 2011-2015 50 non-null object 36 Bachelor's degree or higher, percent of persons age 25 years+, 2011-2015 50 non-null object 37 With a disability, under age 65 years, percent, 2011-2015 50 non-null object 38 Persons without health insurance, under age 65 years, percent 50 non-null object 39 In civilian labor force, total, percent of population age 16 years+, 2011-2015 50 non-null object 40 In civilian labor force, female, percent of population age 16 years+, 2011-2015 50 non-null object 41 Total accommodation and food services sales, 2012 ($1,000) 51 non-null object 42 Total health care and social assistance receipts/revenue, 2012 ($1,000) 51 non-null object 43 Total manufacturers shipments, 2012 ($1,000) 51 non-null object 44 Total merchant wholesaler sales, 2012 ($1,000) 51 non-null object 45 Total retail sales, 2012 ($1,000) 51 non-null object 46 Total retail sales per capita, 2012 51 non-null object 47 Mean travel time to work (minutes), workers age 16 years+, 2011-2015 50 non-null object 48 Median household income (in 2015 dollars), 2011-2015 50 non-null object 49 Per capita income in past 12 months (in 2015 dollars), 2011-2015 50 non-null object 50 Persons in poverty, percent 50 non-null object 51 Total employer establishments, 2015 51 non-null object 52 Total employment, 2015 51 non-null object 53 Total annual payroll, 2015 ($1,000) 51 non-null object 54 Total employment, percent change, 2014-2015 51 non-null object 55 Total nonemployer establishments, 2015 50 non-null object 56 All firms, 2012 50 non-null object 57 Men-owned firms, 2012 50 non-null object 58 Women-owned firms, 2012 50 non-null object 59 Minority-owned firms, 2012 50 non-null object 60 Nonminority-owned firms, 2012 50 non-null object 61 Veteran-owned firms, 2012 50 non-null object 62 Nonveteran-owned firms, 2012 50 non-null object 63 Population per square mile, 2010 50 non-null object 64 Land area in square miles, 2010 50 non-null object dtypes: object(65) memory usage: 26.0+ KB
#dropping the first row (fact_note)
df_census = df_census.drop([0])
#dropping unwanted columns in df_census
df_census = df_census.drop(df_census.iloc[:,65:], axis = 1)
#checking for duplicate in df_census
df_census.duplicated().sum()
0
#checking for missing values in df_census
df_census.isnull().sum()
Fact
index 0
Population estimates, July 1, 2016, (V2016) 0
Population estimates base, April 1, 2010, (V2016) 0
Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016) 0
Population, Census, April 1, 2010 0
..
Nonminority-owned firms, 2012 0
Veteran-owned firms, 2012 0
Nonveteran-owned firms, 2012 0
Population per square mile, 2010 0
Land area in square miles, 2010 0
Length: 65, dtype: int64
#removing %,$ and comma from the data set
col = df_census.columns
df_census[col] = df_census[col].replace({'\$': '', ',': '','\%':''}, regex=True)
#creating a purchase_trend dataframe.
purchase_trend = pd.pivot_table(data = df_gun, index = 'month', columns= 'state', values= 'totals')
purchase_trend = purchase_trend.reset_index()
purchase_trend['total'] = purchase_trend.iloc[:,1:].sum(axis = 1)
purchase_trend
| state | month | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | District of Columbia | ... | Texas | Utah | Vermont | Virgin Islands | Virginia | Washington | West Virginia | Wisconsin | Wyoming | total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1998-11-01 | 1062 | 145 | 379 | 589 | 2101 | 622 | 80 | 55 | 0 | ... | 2794 | 267 | 59 | 0 | 24 | 361 | 408 | 241 | 107 | 21176 |
| 1 | 1998-12-01 | 35506 | 3840 | 17074 | 21163 | 65344 | 23176 | 6790 | 2080 | 0 | ... | 79605 | 10415 | 2057 | 0 | 25170 | 11641 | 13786 | 15201 | 3379 | 870722 |
| 2 | 1999-01-01 | 18049 | 2278 | 12859 | 11953 | 56953 | 19503 | 6265 | 1128 | 1 | ... | 50992 | 5055 | 1043 | 0 | 14009 | 8695 | 8260 | 7780 | 2180 | 585974 |
| 3 | 1999-02-01 | 20583 | 2413 | 14546 | 15348 | 57471 | 22239 | 8069 | 1077 | 3 | ... | 55148 | 5933 | 1668 | 0 | 16053 | 9383 | 11206 | 10578 | 2643 | 690215 |
| 4 | 1999-03-01 | 19424 | 3206 | 14992 | 13720 | 68327 | 17287 | 7877 | 1314 | 2 | ... | 54096 | 6021 | 1941 | 0 | 17193 | 10551 | 10867 | 14891 | 2776 | 741687 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 222 | 2017-05-01 | 33331 | 6442 | 28734 | 16749 | 129307 | 36008 | 16149 | 4012 | 82 | ... | 109732 | 23844 | 2577 | 79 | 36007 | 45816 | 15085 | 37888 | 4128 | 1898840 |
| 223 | 2017-06-01 | 35687 | 6353 | 28359 | 16614 | 130187 | 36459 | 16504 | 3634 | 74 | ... | 118091 | 22138 | 2382 | 153 | 36838 | 46023 | 14498 | 36267 | 3975 | 1888266 |
| 224 | 2017-07-01 | 31474 | 6487 | 25714 | 14015 | 114595 | 33811 | 14525 | 2981 | 68 | ... | 98338 | 17925 | 2275 | 151 | 36183 | 38369 | 13970 | 32846 | 3398 | 1733308 |
| 225 | 2017-08-01 | 35038 | 6946 | 29376 | 17517 | 130901 | 37150 | 14019 | 3509 | 80 | ... | 110946 | 22010 | 2676 | 132 | 39114 | 42120 | 14770 | 40924 | 4111 | 1896457 |
| 226 | 2017-09-01 | 32019 | 6303 | 28394 | 17747 | 123506 | 35873 | 12117 | 3502 | 61 | ... | 128260 | 20041 | 3084 | 9 | 36446 | 43049 | 16723 | 37506 | 4281 | 1857226 |
227 rows × 57 columns
#plotting total purchase trend line graph
plt.figure(figsize=(40,20))
# Preparing the data to subplots
x = purchase_trend['month']
y = purchase_trend['total']
# Plot the subplots
plt.subplot(2, 2, 1)
plt.plot(x, y, 'g')
# set chart title
plt.title("total purchase trend" , fontsize=20)
#set Axis label
plt.xlabel("Year", fontsize=20)
plt.ylabel("total purchase", fontsize=20)
plt.show()
df_gun
| month | state | permit | permit_recheck | handgun | long_gun | other | multiple | admin | prepawn_handgun | ... | returned_other | rentals_handgun | rentals_long_gun | private_sale_handgun | private_sale_long_gun | private_sale_other | return_to_seller_handgun | return_to_seller_long_gun | return_to_seller_other | totals | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-09-01 | Alabama | 16717.0 | 0.0 | 5734.0 | 6320.0 | 221.0 | 317 | 0.0 | 15.0 | ... | 0.0 | 0.0 | 0.0 | 9.0 | 16.0 | 3.0 | 0.0 | 0.0 | 3.0 | 32019 |
| 1 | 2017-09-01 | Alaska | 209.0 | 2.0 | 2320.0 | 2930.0 | 219.0 | 160 | 0.0 | 5.0 | ... | 0.0 | 0.0 | 0.0 | 17.0 | 24.0 | 1.0 | 0.0 | 0.0 | 0.0 | 6303 |
| 2 | 2017-09-01 | Arizona | 5069.0 | 382.0 | 11063.0 | 7946.0 | 920.0 | 631 | 0.0 | 13.0 | ... | 0.0 | 0.0 | 0.0 | 38.0 | 12.0 | 2.0 | 0.0 | 0.0 | 0.0 | 28394 |
| 3 | 2017-09-01 | Arkansas | 2935.0 | 632.0 | 4347.0 | 6063.0 | 165.0 | 366 | 51.0 | 12.0 | ... | 0.0 | 0.0 | 0.0 | 13.0 | 23.0 | 0.0 | 0.0 | 2.0 | 1.0 | 17747 |
| 4 | 2017-09-01 | California | 57839.0 | 0.0 | 37165.0 | 24581.0 | 2984.0 | 0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 123506 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12480 | 1998-11-01 | Virginia | 0.0 | NaN | 14.0 | 2.0 | NaN | 8 | 0.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 24 |
| 12481 | 1998-11-01 | Washington | 1.0 | NaN | 65.0 | 286.0 | NaN | 8 | 1.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 361 |
| 12482 | 1998-11-01 | West Virginia | 3.0 | NaN | 149.0 | 251.0 | NaN | 5 | 0.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 408 |
| 12483 | 1998-11-01 | Wisconsin | 0.0 | NaN | 25.0 | 214.0 | NaN | 2 | 0.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 241 |
| 12484 | 1998-11-01 | Wyoming | 8.0 | NaN | 45.0 | 49.0 | NaN | 5 | 0.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 107 |
12485 rows × 27 columns
#creating a new data set form df_gun
df_gun_growth = df_gun.loc[:, ['month', 'state', 'totals']]
df_gun_growth.reset_index()
| index | month | state | totals | |
|---|---|---|---|---|
| 0 | 0 | 2017-09-01 | Alabama | 32019 |
| 1 | 1 | 2017-09-01 | Alaska | 6303 |
| 2 | 2 | 2017-09-01 | Arizona | 28394 |
| 3 | 3 | 2017-09-01 | Arkansas | 17747 |
| 4 | 4 | 2017-09-01 | California | 123506 |
| ... | ... | ... | ... | ... |
| 12480 | 12480 | 1998-11-01 | Virginia | 24 |
| 12481 | 12481 | 1998-11-01 | Washington | 361 |
| 12482 | 12482 | 1998-11-01 | West Virginia | 408 |
| 12483 | 12483 | 1998-11-01 | Wisconsin | 241 |
| 12484 | 12484 | 1998-11-01 | Wyoming | 107 |
12485 rows × 4 columns
#extracting year from the month column
df_gun_growth['year']= pd.to_datetime(df_gun_growth['month']).dt.strftime('%Y')
df_gun_growth['months']= pd.to_datetime(df_gun_growth['month']).dt.strftime('%m')
#dropping the month (date column)
df_gun_growth = df_gun_growth.drop(columns = 'month')
#extracting data for 1998 and 2017
df_gun_1998 = df_gun_growth.query('year == "1998"')
df_gun_2017 = df_gun_growth.query('year == "2017"')
#grouping by df_gun_1998 state
df_gun_1998 = df_gun_1998.groupby(['state'])['totals'].sum().reset_index()
#rename totals as 2017totals
df_gun_1998 = df_gun_1998.rename(columns = {'totals': '1998totals'})
#grouping by df_gun_1998 state
df_gun_2017 = df_gun_2017.groupby(['state'])['totals'].sum().reset_index()
#rename totals as 2017totals
df_gun_2017 = df_gun_2017.rename(columns = {'totals': '2017totals'})
#merging th two dataset
df_growth = df_gun_2017.merge(df_gun_1998, how = 'inner', on = 'state')
df_growth
| state | 2017totals | 1998totals | |
|---|---|---|---|
| 0 | Alabama | 336763 | 36568 |
| 1 | Alaska | 56902 | 3985 |
| 2 | Arizona | 279553 | 17453 |
| 3 | Arkansas | 161694 | 21752 |
| 4 | California | 1167528 | 67445 |
| 5 | Colorado | 345559 | 23798 |
| 6 | Connecticut | 138789 | 6870 |
| 7 | Delaware | 36826 | 2135 |
| 8 | District of Columbia | 754 | 0 |
| 9 | Florida | 912765 | 32341 |
| 10 | Georgia | 378921 | 3819 |
| 11 | Guam | 1372 | 9 |
| 12 | Hawaii | 9217 | 431 |
| 13 | Idaho | 131447 | 7339 |
| 14 | Illinois | 1163945 | 41477 |
| 15 | Indiana | 606881 | 22518 |
| 16 | Iowa | 110551 | 14650 |
| 17 | Kansas | 132840 | 12386 |
| 18 | Kentucky | 3417976 | 31869 |
| 19 | Louisiana | 224545 | 27642 |
| 20 | Maine | 68320 | 4106 |
| 21 | Mariana Islands | 106 | 0 |
| 22 | Maryland | 112680 | 9007 |
| 23 | Massachusetts | 154161 | 1443 |
| 24 | Michigan | 369053 | 30596 |
| 25 | Minnesota | 529302 | 12116 |
| 26 | Mississippi | 169056 | 26097 |
| 27 | Missouri | 362047 | 24087 |
| 28 | Montana | 88179 | 6406 |
| 29 | Nebraska | 50519 | 6655 |
| 30 | Nevada | 93734 | 6439 |
| 31 | New Hampshire | 95057 | 2442 |
| 32 | New Jersey | 77113 | 3795 |
| 33 | New Mexico | 111491 | 9033 |
| 34 | New York | 276227 | 14736 |
| 35 | North Carolina | 385554 | 38943 |
| 36 | North Dakota | 47906 | 2437 |
| 37 | Ohio | 540654 | 32481 |
| 38 | Oklahoma | 228878 | 20075 |
| 39 | Oregon | 240741 | 14024 |
| 40 | Pennsylvania | 763378 | 48317 |
| 41 | Puerto Rico | 12431 | 513 |
| 42 | Rhode Island | 18132 | 1059 |
| 43 | South Carolina | 295532 | 9565 |
| 44 | South Dakota | 68914 | 2751 |
| 45 | Tennessee | 549639 | 24773 |
| 46 | Texas | 1074971 | 82399 |
| 47 | Utah | 199279 | 10682 |
| 48 | Vermont | 27148 | 2116 |
| 49 | Virgin Islands | 925 | 0 |
| 50 | Virginia | 366994 | 25194 |
| 51 | Washington | 419522 | 12002 |
| 52 | West Virginia | 152784 | 14194 |
| 53 | Wisconsin | 387537 | 15442 |
| 54 | Wyoming | 37736 | 3486 |
#adding growth rate
df_growth['growth_rate'] = (df_growth['2017totals']- df_growth['1998totals']) / df_growth['1998totals']
df_growth = df_growth.sort_values(['growth_rate'], ascending = False).reset_index()
#removing the umwanted rows
df_growth = df_growth.drop(labels = [0,1,2], axis = 0)
df_growth= df_growth.drop(columns = {'index'}, axis= 0).reset_index()
df_growth
| index | state | 2017totals | 1998totals | growth_rate | |
|---|---|---|---|---|---|
| 0 | 3 | Guam | 1372 | 9 | 151.444444 |
| 1 | 4 | Kentucky | 3417976 | 31869 | 106.250808 |
| 2 | 5 | Massachusetts | 154161 | 1443 | 105.833680 |
| 3 | 6 | Georgia | 378921 | 3819 | 98.219953 |
| 4 | 7 | Minnesota | 529302 | 12116 | 42.686200 |
| 5 | 8 | New Hampshire | 95057 | 2442 | 37.925880 |
| 6 | 9 | Washington | 419522 | 12002 | 33.954341 |
| 7 | 10 | South Carolina | 295532 | 9565 | 29.897229 |
| 8 | 11 | Florida | 912765 | 32341 | 27.223153 |
| 9 | 12 | Illinois | 1163945 | 41477 | 27.062420 |
| 10 | 13 | Indiana | 606881 | 22518 | 25.950928 |
| 11 | 14 | Wisconsin | 387537 | 15442 | 24.096296 |
| 12 | 15 | South Dakota | 68914 | 2751 | 24.050527 |
| 13 | 16 | Puerto Rico | 12431 | 513 | 23.231969 |
| 14 | 17 | Tennessee | 549639 | 24773 | 21.187018 |
| 15 | 18 | Hawaii | 9217 | 431 | 20.385151 |
| 16 | 19 | New Jersey | 77113 | 3795 | 19.319631 |
| 17 | 20 | Connecticut | 138789 | 6870 | 19.202183 |
| 18 | 21 | North Dakota | 47906 | 2437 | 18.657776 |
| 19 | 22 | New York | 276227 | 14736 | 17.745046 |
| 20 | 23 | Utah | 199279 | 10682 | 17.655589 |
| 21 | 24 | Idaho | 131447 | 7339 | 16.910751 |
| 22 | 25 | California | 1167528 | 67445 | 16.310816 |
| 23 | 26 | Delaware | 36826 | 2135 | 16.248712 |
| 24 | 27 | Oregon | 240741 | 14024 | 16.166358 |
| 25 | 28 | Rhode Island | 18132 | 1059 | 16.121813 |
| 26 | 29 | Ohio | 540654 | 32481 | 15.645239 |
| 27 | 30 | Maine | 68320 | 4106 | 15.639065 |
| 28 | 31 | Arizona | 279553 | 17453 | 15.017476 |
| 29 | 32 | Pennsylvania | 763378 | 48317 | 14.799367 |
| 30 | 33 | Missouri | 362047 | 24087 | 14.030805 |
| 31 | 34 | Virginia | 366994 | 25194 | 13.566722 |
| 32 | 35 | Nevada | 93734 | 6439 | 13.557229 |
| 33 | 36 | Colorado | 345559 | 23798 | 13.520506 |
| 34 | 37 | Alaska | 56902 | 3985 | 13.279046 |
| 35 | 38 | Montana | 88179 | 6406 | 12.765064 |
| 36 | 39 | Texas | 1074971 | 82399 | 12.045923 |
| 37 | 40 | Vermont | 27148 | 2116 | 11.829868 |
| 38 | 41 | Maryland | 112680 | 9007 | 11.510270 |
| 39 | 42 | New Mexico | 111491 | 9033 | 11.342633 |
| 40 | 43 | Michigan | 369053 | 30596 | 11.062132 |
| 41 | 44 | Oklahoma | 228878 | 20075 | 10.401146 |
| 42 | 45 | Wyoming | 37736 | 3486 | 9.825014 |
| 43 | 46 | West Virginia | 152784 | 14194 | 9.763985 |
| 44 | 47 | Kansas | 132840 | 12386 | 9.725012 |
| 45 | 48 | North Carolina | 385554 | 38943 | 8.900470 |
| 46 | 49 | Alabama | 336763 | 36568 | 8.209227 |
| 47 | 50 | Louisiana | 224545 | 27642 | 7.123327 |
| 48 | 51 | Nebraska | 50519 | 6655 | 6.591134 |
| 49 | 52 | Iowa | 110551 | 14650 | 6.546143 |
| 50 | 53 | Arkansas | 161694 | 21752 | 6.433523 |
| 51 | 54 | Mississippi | 169056 | 26097 | 5.477986 |
df_growth.query('state == "Guam"')
| index | state | 2017totals | 1998totals | growth_rate | |
|---|---|---|---|---|---|
| 0 | 3 | Guam | 1372 | 9 | 151.444444 |
plt.figure(figsize=(20,10))
def growth_rate():
return (df_growth['2017totals']- df_growth['1998totals']) / df_growth['1998totals']
plt.bar(df_growth['state'].head(10), df_growth['growth_rate'].head(10))
plt.ylabel("% of growth", fontsize=20)
plt.xlabel("State", fontsize=20)
# set chart title
plt.title("Top Ten highest growth in gun registration by state ", fontsize=20)
Text(0.5, 1.0, 'Top Ten highest growth in gun registration by state ')
As seen in the chart above, Guam has the highest growth in gun registration follow by kentucy and Massacusetts.
The cenusus data provide data for only 2010 and 2016, so i will extract the data for 2010 and 2016 from df_gun data, then merge with census data using state as the foriegn key to get correlation for the data.
#creating a dfnew_gun
df_new_gun= df_gun.loc[:, ['month', 'state', 'totals']]
df_new_gun
| month | state | totals | |
|---|---|---|---|
| 0 | 2017-09-01 | Alabama | 32019 |
| 1 | 2017-09-01 | Alaska | 6303 |
| 2 | 2017-09-01 | Arizona | 28394 |
| 3 | 2017-09-01 | Arkansas | 17747 |
| 4 | 2017-09-01 | California | 123506 |
| ... | ... | ... | ... |
| 12480 | 1998-11-01 | Virginia | 24 |
| 12481 | 1998-11-01 | Washington | 361 |
| 12482 | 1998-11-01 | West Virginia | 408 |
| 12483 | 1998-11-01 | Wisconsin | 241 |
| 12484 | 1998-11-01 | Wyoming | 107 |
12485 rows × 3 columns
#extracting year and month from month column
df_new_gun['year']= pd.to_datetime(df_new_gun['month']).dt.strftime('%Y')
df_new_gun['months']= pd.to_datetime(df_new_gun['month']).dt.strftime('%m')
#dropping the month (date column)
df_new_gun = df_new_gun.drop(columns = 'month')
#extraction 2010-2016 data
df_new_gun = df_new_gun.query('year == "2010"'and 'year == "2016"')
#grouping df_new_gun by state
df_new_gun = df_new_gun.groupby('state')['totals'].sum().reset_index()
df_new_gun.head()
| state | totals | |
|---|---|---|
| 0 | Alabama | 616947 |
| 1 | Alaska | 87647 |
| 2 | Arizona | 416279 |
| 3 | Arkansas | 266014 |
| 4 | California | 2377167 |
#extracting census data for 2016 and 2010
#renameing the column as state
df_census_2016_2010 = df_census.iloc[:,0:4]
df_census_2016_2010.rename(columns = {'index':'state','Population estimates, July 1, 2016, (V2016)' :'population_2016', 'Population estimates base, April 1, 2010, (V2016)': 'population_2010' , 'Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016)': '%population_change2010and2016'}, inplace = True)
df_census_2016_2010.head()
| Fact | state | population_2016 | population_2010 | %population_change2010and2016 |
|---|---|---|---|---|
| 1 | Alabama | 4863300 | 4780131 | 1.70 |
| 2 | Alaska | 741894 | 710249 | 4.50 |
| 3 | Arizona | 6931071 | 6392301 | 8.40 |
| 4 | Arkansas | 2988248 | 2916025 | 2.50 |
| 5 | California | 39250017 | 37254522 | 5.40 |
#merging the df_new_gun and df_census
df = df_census_2016_2010.merge(df_new_gun, how = 'inner', on = 'state')
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 50 entries, 0 to 49 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 state 50 non-null object 1 population_2016 50 non-null object 2 population_2010 50 non-null object 3 %population_change2010and2016 50 non-null object 4 totals 50 non-null int64 dtypes: int64(1), object(4) memory usage: 2.3+ KB
#changing the datatype to float
df['population_2016'] = df['population_2016'].astype(float)
df['population_2010'] = df['population_2010'].astype(float)
df['%population_change2010and2016'] = df['%population_change2010and2016'].astype(float)
#creating correlation table
#df_corr = df.corr()[['totals']][:-1].sort_values(by='totals', ascending = True)
df_corr = df.corr()
df_corr
| population_2016 | population_2010 | %population_change2010and2016 | totals | |
|---|---|---|---|---|
| population_2016 | 1.000000 | 0.999079 | 0.232673 | 0.605044 |
| population_2010 | 0.999079 | 1.000000 | 0.205972 | 0.607005 |
| %population_change2010and2016 | 0.232673 | 0.205972 | 1.000000 | 0.098395 |
| totals | 0.605044 | 0.607005 | 0.098395 | 1.000000 |
#plotting the correlation
hm = px.imshow(df_corr, height = 1000, text_auto=True, aspect="auto")
hm.update_layout(title = 'Heatmap of census data correlation with firearm registrations', title_x = 1.0)
hm.update_xaxes(side="top")
hm.show()
what is the overall trend in gun purchase?: The result shows that there is increase in gun purchase over time from 1998 to 2017,the trend is seasonal which peak at december.
what state has the highest growth in gun purchase?: The result shows that Guam has the highest growth (151.444444%) in Gun registration, the total gun registration increase from 9 in 1998 to 1372 in 2017.
What census data is most associated with high gun per capital?: Both 2010 population and 2016 population census data have positive correlation with the total gun checkedin 2010 and 2016. 2010 population census has 0.605044 which is greater than 0.605044 of 2016 population census.
The main limitation is the lack of census data that prevented more detailed analysis: the census data is available only for the period 2010 - 2016. Missing data in the NICS Gun-check dataset were removed.